Troubleshoot Availability Group Exceeded RTO or RPO issue

Introduction

In this lab, you will try to troubleshoot the reason why you are not able to retrieve data when querying a readable secondary replica database. The goal is to identify the cause and fix the issue.

Objectives

At the end of this lab, you will be able to:

  • Understand the reason why you are not able to retrieve the data
  • Fix the issue and resolve the slow performance.

Estimated Time

45 minutes

Logon Information

Use the following credentials to login into virtual environment

  • Username: corpnet\cluadmin
  • Password: Pa$$w0rd

Before starting the training module, we recommend that you launch the labs and give them some time to stabilize. Please be aware that sometimes the AG may be in a resolving state and AG Replicas may be in a disconnected state. This is a platform issue and should stabilize after a few minutes

Environment review

Before you begin with the first exercise in the lab, let's review the lab environment.

  • In the lab, you have one Domain Controller and 3 nodes + 1 client computer.

  • AlwaysOnN1 and AlwaysOnN2 nodes are in the primary Datacenter.

  • AlwaysOnN3 is in the secondary datacenter.

  • AlwaysOnN1 and AlwaysOnN2 are configured in synchronous commit mode

  • For this lab, both the datacenters are in the same subnet.

  • Each node has Windows Server 2022 O/S installed.

  • SQL Server 2022 Standalone instances are installed on all the 3 nodes (i.e. AlwaysOnN1, AlwaysOnN2, AlwaysOnN3).

  • SQL Service logon account details

    • SQL Service Logon account: corpnet\SQLSvc
    • Password: Pa$$w0rd

Back to the list of Labs

Exercise 1: Connect to the Primary and Secondary Replicas and execute the workload

In this exercise, you will learn how to Connect to the Primary and Secondary Replicas and execute the workload.


Tasks

  1. Connect to the Primary Replica and Execute the workload

    Perform this task on the virtual machine AlwaysOnClient.

  2. Open SSMS and connect to AGCorpListen, this is the listener for the Availability Group deployed between AlwaysOnN1, AlwaysOnN2 and AlwaysOnN3 replicas for the database AdventureWorks.

  3. After connecting to the listener, review the primary and secondary replicas under the AGCorp Availability Group. In the below example, AlwaysOnN2 is the primary replica and AlwaysOnN1 and AlwaysOnN3 are the secondaries.

    impage0076a.png

    Ensure that AlwaysOnN2 is the primary replica. If it is not the primary replica, then perform a manual failover to make it primary before running the workload.

  4. Start the workload by double-clicking the file C:\scripts\Runworkload_Primary_1.cmd

  5. Start another workload by double-clicking the file C:\scripts\Runworkload_Secondary_1.cmd

  6. Start the workload by double-clicking the file C:\scripts\Runworkload_Primary_2.cmd

    Make sure you are running the workloads in the same order as above.

  7. Query the Demotest table on the primary. You should be able to see data. But when you query the Demotest table on the secondary, you are not able to see the results.

Congratulations!

You have successfully completed this exercise. Click Next to advance to the next exercise.

Exercise 2: Identify the cause of the failure and fix the issue

In this exercise, you will learn how to Identify the cause of the failure and fix the issue.


Tasks

  1. Find the root cause using the guidelines and cheat sheet provided below.

Guidelines

  1. This is a non-guided activity and the attendees are expected to try and troubleshoot this issue on their own.

  2. You can use any resources (including the internet or your own scripts), to troubleshoot the issue.

  3. You can use the tools discussed in the first module to help troubleshoot the issue.

  4. The possible causes discussed earlier in the lesson can be used as guidance for troubleshooting.

  5. The instructor will discuss the troubleshooting steps, cause and solution in detail after this lab session.

  6. You might have to login directly on the individual nodes to troubleshoot the issue.

  7. Ask yourself the below questions:

    • Where do you start?
    • What logs will provide me additional insights?
    • What tools can I use to troubleshoot this issue?
    • Could one or more of the possible causes discussed earlier be the issue here?

Cheat Sheet

  1. Tools to help you troubleshoot the issue

    This is not a complete list of commands/tools to help troubleshoot this issue. There are various ways/methods/approaches to troubleshoot an issue. These commands/tools referenced here could be used to look up/identify useful information for this lab.

    DMV's and System Views

    • sys.dm_hadr_database_replica_states
    • sys.dm_hadr_availability_replica_states
    • sys.availability_replicas
    • sys.dm_os_waiting_tasks
    • sys.dm_os_wait_stats
    • sys.dm_exec_requests

Congratulations!

You have successfully completed this exercise. Click Next to advance to the next lab.